A Novel Approach to Classification in Financial Applications
نویسندگان
چکیده
Modern methods for classification analysis involve processes for “learning” to correctly assign elements of a data set to certain classes. In many settings, the learning processes are supervised; i.e. the classes that the training data belong to are known in advance. In many other settings, however, the classes are not known a priori, and a process utilizing unsupervised learning is necessary. We present a novel, two-stage unsupervised learning methodology for the classification problem. Stage one consists of a special clustering method based on a quadratic, unconstrained optimization model that finds optimal classes for the data. Stage two makes use of enhanced mathematical programming models for classifying the data into the optimal classes found during stage one. A significant advantage of our approach, as demonstrated by computational testing, is the ability to yield more meaningful classifications than previously achieved in a variety of settings. We report the outcome of training and testing our method on various data sets from the data mining literature, with specific applications in finance. The comparative results disclose the effectiveness and versatility of the approach, and its merit as a tool for modeling and solving practical problems. Introduction Many classification and discrimination analysis applications involve supervised learning, in which the training data is labeled with the appropriate class definition. In some settings, however, the class definition itself may have been subjective or ambiguous. For example, bond rating agencies such as Moody’s and Standard and Poor each have a proprietary algorithm for rating bonds, which may result in different rating scales, and thus, different assessments of the risk of the same underlying bond. In such instances, it is unclear whether one class definition is better than another. Furthermore, there is a certain amount of subjectivity in the class definition inasmuch as the “experts” evaluating the elements of the different classes may disagree on the relative importance of each of the attributes used as criteria for classification. In order to overcome this problem, we propose a two-stage approach to the classification problem. The first stage clusters the data into “optimal” classes, and the second stage seeks to classify the data correctly into the optimal classes found in stage one. For the purpose of clustering the data, we use the method described in Kochenberger et al (2005), which makes use of a quadratic unconstrained binary quadratic program (UBQP) for clique partitioning. A tabu search (TS) procedure from Glover et al (1999) is used to efficiently solve the UBQP. The classification stage is carried out by a multi-hyperplane mixed integer programming formulation for discrimination analysis, similar to those described in Better et al (2006). The paper is organized as follows: section 1 provides a brief description of our clustering algorithm; section 2 describes a basic multi-hyperplane model for classification of data in two groups; section 3 introduces two examples that use real data in order to illustrate our approach; and section 4 summarizes our results and our conclusions.
منابع مشابه
Credit scoring in banks and financial institutions via data mining techniques: A literature review
This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...
متن کاملA Novel Fault Detection and Classification Approach in Transmission Lines Based on Statistical Patterns
Symmetrical nature of mean of electrical signals during normal operating conditions is used in the fault detection task for dependable, robust, and simple fault detector implementation is presented in this work. Every fourth cycle of the instantaneous current signal, the mean is computed and carried into the next cycle to discover nonlinearities in the signal. A fault detection task is complete...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملFace Recognition using an Affine Sparse Coding approach
Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...
متن کاملFinancial crisis and exchange market pressure In energy exporting countries: Fisher's discriminant function approach
Financial crises are unpredictable and threatening the economic stability of countries. Hence, policymakers are forced to adopt appropriate tactics to defuse and resolve crises. One of the indicators that helps policymakers and economists is the exchange market pressure. The purpose of this study is to examine the factors affecting the foreign exchange market pressure during 2008- 2009 financia...
متن کاملFDiBC: A Novel Fraud Detection Method in Bank Club based on Sliding Time and Scores Window
One of the recent strategies for increasing the customer’s loyalty in banking industry is the use of customers’ club system. In this system, customers receive scores on the basis of financial and club activities they are performing, and due to the achieved points, they get credits from the bank. In addition, by the advent of new technologies, fraud is growing in banking domain as well. Therefor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006